69 research outputs found

    Probing a Continuum of Macro-molecular Assembly Models with Graph Templates of Complexes

    Get PDF
    Reconstruction by data integration is an emerging trend to reconstruct large protein assemblies, but uncertainties on the input data yield average models whose quantitative interpretation is challenging. This paper presents methods to probe fuzzy models of large assemblies against atomic resolution models of sub-systems. More precisely, consider a Toleranced Model (TOM) of a macro-molecular assembly, namely a continuum of nested shapes representing the assembly at multiple scales. Also consider a template namely an atomic resolution 3D model of a sub-system (a complex) of this assembly. We present graph-based algorithms performing a multi-scale assessment of the complexes of the TOM, by comparing the pairwise contacts which appear in the TOM against those of the template. We apply this machinery to recent average models of the Nuclear Pore Complex, and confront our observations to the latest experimental work.La reconstruction par intégration de données est une modalité émergente pour reconstruire de gros assemblages macro-moléculaires, mais les incertitudes sur les entrées donnent lieu à la génération de modèles moyens dont l'interprétation quantitative est délicate. Ce travail présente des méthodes pour comparer de tels modèles moyens à des structures de sous-systèmes connus à résolution atomique. Plus précisément, considérons un modèle tolérancé (TOM) d'un assemblage, i.e. un continuum de formes imbriquées représentant l'assemblage à diverses échelles. Considérons également un {\em template}, i.e. un modèle à résolution atomique d'un sous-système. Nous présentons des outils dérivés de la théorie des graphes, permettant de comparer les contacts entre les protéines du TOM aux contacts du template. Nous utilisons ces outils pour analyser des modèles moyens du pore nucléaire récemment produits, et discutons nos résultats à la lumière des données expérimentales les plus récentes

    Multi-scale Geometric Modeling of Ambiguous Shapes with Toleranced Balls and Compoundly Weighted alpha-shapes

    Get PDF
    Also as INRIA Tech report 7306International audienceDealing with ambiguous data is a challenge in Science in general and geometry processing in particular. One route of choice to extract information from such data consists of replacing the ambiguous input by a continuum, typically a one-parameter family, so as to mine stable geometric and topological features within this family. This work follows this spirit and introduces a novel framework to handle 3D ambiguous geometric data which are naturally modeled by balls. First, we introduce toleranced balls to model ambiguous geometric objects. A toleranced ball consists of two concentric balls, and interpolating between their radii provides a way to explore a range of possible geometries. We propose to model an ambiguous shape by a collection of toleranced balls, and show that the aforementioned radius interpolation is tantamount to the growth process associated with an additively-multiplicatively weighted Voronoi diagram (also called compoundly weighted or CW). Second and third, we investigate properties of the CW diagram and the associated CW -complex, which provides a ltration called the -complex. Fourth, we sketch a naive algorithm to compute the CW VD. Finally, we use the -complex to assess the quality of models of large protein assemblies, as these models inherently feature ambiguities

    The Structural Bioinformatics Library: modeling in biomolecular science and beyond

    Get PDF
    Motivation: Software in structural bioinformatics has mainly been application driven. To favor practitionersseeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novelapplications, we undertook the design of the Structural Bioinformatics Library (SBL, http://sbl.inria.fr), a generic C++/python cross-platform software library targeting complex problems in structuralbioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing thedevelopment of novel applications requiring well specified complex operations, without compromisingrobustness and performances.Results: The SBL involves four software components (1-4 thereafter). For end-users, the SBL providesready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, todeal with molecular flexibility, to model macro-molecular assemblies. These tools can also be combined totackle integrated analysis problems. For developers, the SBL provides a broad C++ toolbox with modulardesign, involving core (2) algorithms, (3) biophysical models, and (4) modules, the latter being especiallysuited to develop novel applications. The SBL comes with a thorough documentation consisting of userand reference manuals, and a bugzilla platform to handle community feedback.Availability: The SBL is available fro

    Multi-scale Geometric Modeling of Ambiguous Shapes with Toleranced Balls and Compoundly Weighted alpha-shapes

    Get PDF
    Dealing with ambiguous data is a challenge in Science in general and geometry processing in particular. One route of choice to extract information from such data consists of replacing the ambiguous input by a continuum, typically a one-parameter family, so as to mine stable geometric and topological features within this family. This work follows this spirit and introduces a novel framework to handle 3D ambiguous geometric data which are naturally modeled by balls. First, we introduce {\em toleranced balls} to model ambiguous geometric objects. A toleranced ball consists of two concentric balls, and interpolating between their radii provides a way to explore a range of possible geometries. We propose to model an ambiguous shape by a collection of toleranced balls, and show that the aforementioned radius interpolation is tantamount to the growth process associated with an additively-multiplicatively weighted Voronoi diagram (also called compoundly weighted or CW). Second and third, we investigate properties of the CW diagram and the associated CW α\alpha-complex, which provides a filtration called the λ\lambda-complex. Fourth, we propose a naive algorithm to compute the CW VD. Finally, we use the λ\lambda-complex to assess the quality of models of large protein assemblies, as these models inherently feature ambiguities

    Assessing the Reconstruction of Macro-molecular Assemblies: the Example of the Nuclear Pore Complex

    Get PDF
    The reconstruction of large protein assemblies is a major challenge due to their plasticity and due to the flexibility of the proteins involved. An emerging trend to cope with these uncertainties consists of performing the reconstruction by integrating experimental data from several sources, a strategy recently used to propose qualitative reconstructions of the Nuclear Pore Complex. Yet, the absence of clearly identified canonical reconstructions and the lack of quantitative assessment with respect to the experimental data are detrimental to the mechanistic exploitation of the results. To leverage such reconstructions, this work proposes a modeling framework inherently accommodating uncertainties, and allowing a precise assessment of the reconstructed models. We make three contributions. First, we introduce {\em toleranced models} to accommodate the positional and conformational uncertainties of protein instances within large assemblies. A toleranced model is a continuum of geometries whose distinct topologies can be enumerated, and mining stable complexes amidst this finite set hints at important structures in the assembly. Second, we present a panoply of tools to perform a multi-scale topological, geometric, and biochemical assessment of the complexes associated to a toleranced model, at the assembly and local levels. At the assembly level, we assess the prominence of contacts and the quality of the reconstruction, in particular w.r.t symmetries. At the local level, the complexes encountered in the toleranced model are used to confirm / question / suggest protein contacts within a known 3D template known at atomic resolution. Third, we apply our machinery to the NPC for which we (i) report prominent contacts uncovering sub-complexes of the NPC, (ii) explain the closure of the two rings involving 16 copies of the YY-complex, and (iii) develop a new 3D template for the TT-complex. These contributions should prove instrumental in enhancing the reconstruction of assemblies, and in selecting the models which best comply with experimental data

    Greedy Geometric Optimization Algorithms for Collection of Balls

    Get PDF
    Modeling 3D objects with balls is routine for two reasons: on the one hand, the medial axis transform allows representing a solid object as a union of medial balls; on the other hand, selected shapes, and molecules in particular, are naturally represented by collections of balls. Yet, the problem of choosing which balls are best suited to approximate a given shape is a non trivial one. This paper addresses two problems in this realm. The first one, conformational diversity selection, consists of choosing kk molecular conformations amidst nn, so as to maximize the geometric diversity of the kk conformers. The second one, inner approximation, consists of approximating a molecule of nn balls with kk balls. On the theoretical side, we demonstrate that for both problems, a geometric generalization of max kk-cover applies, with weights depending on the cells of a surface or volumetric arrangement. Tackling these problems with greedy strategies, it is shown that the 11/e1-1/e bound known in combinatorial optimization applies in some cases but not all. On the applied side, we present a robust and effective implementation of the greedy algorithm for the inner approximation problem, which incorporates the calculation of the exact Delaunay triangulation of a points whose coordinates are degree two algebraic number, of the medial axis of a union of balls, and of a certified estimate of the volume of a union of balls. In particular, we show that the inner approximation of complex molecules yields accurate coarse-grain models with a number of balls 100 times smaller than the number of atoms, a key requirement to simulate crowded protein environments.Les boules jouent un rôle central en modélisation géométrique pour deux raisons: d'une part la transformée associée à l'axe médian permet de représenter un objet solide comme une union in nie de boules; d'autre part, certaines formes, et les modèles moléculaires de van der Waals en particulier, sont dé nies par une union de boules. Néanmoins, la question de savoir quel ensemble de boules utiliser pour approximer une forme est non trivial, de telle sorte que ce travail aborde deux problèmes liés. Pour les présenter, par conformation moléculaire, nous entendons un modèle dé ni par un ensemble ni de boules. La premier problème, ou selection de diversité géométrique, consiste à choisir k conformations moléculaires parmi n, de façon à maximiser la diversité de l'ensemble choisi. Le second, ou approximation par défaut, consiste à approximer une molécule de n boules par k < n boules. Du point de vue théorique, nous montrons que les deux problèmes peuvent être traités avec une variante géométrique de max k-cover, les poids dépendant de la géométrie d'un arrangement surfacique ou volumique de sphères. La résolution de ces problèmes par un algorithme glouton permet d'avoir un facteur d'approximation borné inférieurement par 1 1=e dans certains cas. D'un point de vue appliqué, nous présentons une implémentation robuste de l'algorithme glouton pour l'approximation par défaut, laquelle incorpore (i) le calcul exact d'une triangulation de Delaunay dont les points ont des coordonnées qui sont des nombres algébriques de degré deux, (ii) le calcul exact de l'axe médian d'une union de boules, et (iii) une approximation certi ée du volume d'une union de boules. En particulier, nous montrons que des approximations précises de modèles moléculaires peuvent être obtenues en utilisant un nombre de boules 100 fois inférieur au nombre d'atomes, une propriété particulièrement séduisante pour la simulation d'environnement protéique dense

    PALSE: Python Analysis of Large Scale (Computer) Experiments

    Get PDF
    A tenet of Science is the ability to reproduce the results, and a related issue is the possibility to archive and interpret the raw results of (computer) experiments. This paper presents an elementary python framework addressing this latter goal. Consider a computing pipeline consisting of raw data generation, raw data parsing, and data analysis i.e. graphical and statistical analysis. palse addresses these last two steps by leveraging the hierarchical structure of XML documents. More precisely, assume that the raw results of a program are stored in XML format, possibly generated by the serialization mechanism of the boost C++ libraries. For raw data parsing, palse imports the raw data as XML documents, and exploits the tree structure of the XML together with the XML Path Language to access and select specific values. For graphical and statistical analysis, palse gives direct access to ScientificPython, R, and gnuplot. In a nutshell, palse combines standards languages ( python, XML, XML Path Language) and tools (Boost serialization, ScientificPython, R, gnuplot) in such a way that once the raw data have been generated, graphical plots and statistical analysis just require a handful of lines of python code. The framework applies to virtually any type of data, and may find a broad class of applications

    L'hybridation d'arbres aléatoires d'exploration et de basin hopping conduit à une exploration plus efficace des paysages énergétiques

    Get PDF
    The number of local minima of the PEL of molecular systems generally growsexponentially with the number of degrees of freedom, so that a crucial property of PEL explorationalgorithms is their ability to identify local minima which are low lying and diverse.In this work, we present a new exploration algorithm, retaining the ability of basin hopping (BH) toidentify local minima, and that of transition based rapidly growing random trees (T-RRT) to fosterthe exploration of yet unexplored regions. This ability is obtained by interleaving calls to theextension procedures of BH and T-RRT, and we show tuning the balance between these two typesof calls allows the algorithm to focus on low lying regions. Computational efficiency is obtainedusing state-of-the art data structures, in particular for searching approximate nearest neighbors inmetric spaces.We present results for the BLN69, a protein model whose conformational space has dimension 207and whose PEL has been studied exhaustively. On this system, we show that the propensity ofour algorithm to explore low lying regions of the landscape significantly outperforms those of BHand T-RRT
    corecore